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METHOD AND SYSTEM FOR COMPOS ITING THREB- 
DIMENSIONAL GRAPHICS IMAGES USING ASSOCIATIVE 

DECISION MECHANISM 

Field of the Invention 

The present invention relates to the field of computer graphics rendering. 
More particularly, the invention relates to a method and apparatus for the re- 
composition of multiple three-dimensional/depth raster images into a two 
dimensioned image. 

Cross-Reference to Related Applications 

This apphcation claims the benefit of U.S. Provisional Patent Apphcation No, 
60/442,750 filed on January 28, 2003, the entire disclosure of which is 
incorporated herein by reference. 

BackflfTound of the Invention 

As with many types of information processing implementations, there is a 
ongoing effort to improve performance of computer graphics rendering. One of 
the attractive attempts to improve rendering performance is based on using 
multiple graphic processing units (GPUs) harnessed together to render in 
parallel a single scene. 

There are three predominant methods for rendering graphic data with 
multiple GPUs. These include Time Domain Composition, in which each GPU 
renders the next successive firame, Screen Space Composition, in which each 
GPU renders a subset of the pixels of each frame, and Scene based 
Composition, in which each GPU renders a subset of the database. 

In Time Domain Composition each GPU renders the next successive firame. A 
major disadvantage of this method is in having each GPU rendering an entire 
frame. Thus, the speed at which each frame is rendered is hmited to the 
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rendering rate of a single GPU. While mvdtiple GPUs enable a higher frame 
rate, a delay can be imparted (i.e., impairing latency) in Time Domain 
Composition applications in the response time of the system to user's input. 
These delays typically occurs since at any given time only one GPU is 
engaged in displaying a rendered j&rame, while aR the other GPUs are in the 
process of rendering one of a series of frames in a sequence. In order to 
maintain a steady frame rate, the system delays acting on the user's input 
until the specific GPU that first received the user's input cycles through the 
sequence and is again engaged in displaying its rendered frame. In practical 
apphcations, this condition serves to hmit the number of GPUs that are used 
in a system. 



Another difficulty associated with Time Domain Composition apphcations is 
related to the large data sets that each GPU should be able to access, since in 
these apphcations each GPU should be able to gain access to the entire data 
used for the image rendering. This is typically achieved by maintaining 
multiple copies of large data.sets in order to prevent possible conflicts due to 
multiple attempts to access a single copy. 

Screen Space Composition applications have a similar problem in the 
processing of large data sets, since each GPU must examine the entire data 
base to determine which graphic elements faU within its part of the screen. 
The system latency in this case is equivalent to the time required for 
rendering a single frame by a single GPU. 

The Scene Compsition methods, to which the present invention relates, 
excludes the aforementioned latency problems, the requirement of 
maintaining multiple copies of data sets, and of the problems involved in 
handling the entire database by each GPU. 
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The Scene Composition methods weH suits appHcations reqiiiring the 
rendering of a huge amount of geometrical data. TypicaUy these are CAD 
appUcations, and comparable visual simulation appHcations, considered as 
"viewers," meaning that the data have been pre-designed such that their 
three-dimensional positions in space are not under the interactive control of 
the user. However, the user does have interactive control over the viewer's 
position, the direction of view, and the scale of the graphic data. The user also 
may have control over the selection of a subset of the data and the method by 
which it is rendered. This includes manipulating the effects of image Hghting, 
coloration, transparency and other visual characteristics of the underlying 
data. 



In CAD applications, the data tends to be very complex, as it usually consists 
of massive amount of geometry eSktities at the display list or vertex array. 
Therefore the construction time of a single frame tends to be very long (e.g., 
typicaUy 0.5 sec for 20 million polygons), which in result slows down the 
overall system response. 



Scene Composition (e.g. object based decomposition) methods are based on 
the distribution of data subsets among multiple GPUs. The data subsets are 
rendered in the GPU pipeline, and converted to Frame Buffer (FB) of 
fragments (sub-image pixels). The multiple FB's sub-images have to be 
merged to generate the final image to be displayed. As shown in Fig. 1, for 
each pixel in the X/Y plane of the final image there are various possible 
values corresponding to different image depths presented by the FBs' sub- 
images. 



Each GPU produces at most one pixel 12 at each screen's (X/Y) coordinate. 
This composed pixel 12 is a result of the removal of hidden surfaces and the 
shading and color blending needed for effectuating transparency. Each of the 
pixels 12 generated by the GPUs holds a different depth measure (Z-value), 
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Which have to be resolved for the highest Z (the closest to the viewer). Only 
one pixel is finaUy allowed through. The merging of the sub-image of each FB 
is the result of determining which value (10) from the various possible pixels 
values 12 provided by the FBs represents the closest point that is visible in 
viewer's perspective. However, the merging of the partial scene data to one 
single raster, stiU poses a performance bottleneck in the prior art. 

The level of paraUehsm in the prior art is Hmited, due to the inadequacies in 
the composition performance of multiple rasters. The composition of two 
rasters is usually performed by Z-bu£fering, which is a hardware technique 
for performing hidden surface elimination. In the conventional methods of the 
prior art Z-buffering allows merging of only two rasters at a time. 

Conventional hardware compositing techniques, as exampMfed in Fig. 2A, are 
typicaUy based on an iterative coUating process of pairs of rasters (S. Mohier 
"Combining Z-buffer Engines for Higher-Speed Rendering," Eurographics, 
1988), or on pipeUned techniques (J. Eyes at al. "PixelFlow: The Realization," 
ACM Siggraph, 1997). The merging of these techniques is carried out within 
log2fl steps, of S stages, wherein R is the number of rendering GPUs. In the 
collating case, the time needed to accomplish comparison between two depth 
measures at each such comparator (MX) is log^Z, where Z is the depth 
domain of the scene. E.g. for typical depth buffers with 24 bits per pixel, the 
comparison between two Z-buffers is typically performed in 24 time clocks. 

Since in the prior art techniques the merging of only two Z-buffers is allowed 
at a time, composition of multiple rasters is made in a hierarchical fashion. 
The complexity of these composition structures is 0(LoSzR), making the 
performance highly effected by R, the number of graphic pipehnes. For 
growing values of i? the compositing time exceeds the allocated time slot for 
real time animation. In practical applications, this condition serves to limit 
the number of GPUs that are used in a system. Fig. 2B shows the theoretical 
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improvement of performance by increasing paraUeUsm. The composition time 
grows by the factor of the complexity, OCog^E). The aggregated time starts 
increasing at (e.g.) 16 pipelines. Obviously. In this case there is no advantage 
in increasing the level of parallelism beyond 16. 



Software techniques are usually based on compositing the output of i2 GPUs 
by utilizing P general purpose processors (E. Reinhard and C. Hansen "A 
Comparison of Parallel Compositing Techniques on Shared Memory 
Architectures," Eurographics Workshop on Parallel Graphics and 
Visualisation, Girona, 2000). However, these solutions typically requires 
utilizing (i) binary swap, (ii) parallel pipeline, and (iii) shared memory 
compositor, which significantly increase the complexity and cost of such 
implemetations . 



The most efficient implementation among the software techniques is the 
Shared Memory Compositor method (known also as "Direct Send- on 
distributed memory architectures). In this method the computation effort for 
rendering the sub-images is increased by utilizing additional GPUs 
(renderers), as shown in the block diagram of Fig. 3A and the pseudo code 
shown in Fig. 3B. In the system illustrated in Fig. 3A, 2 compositors (CPUs, 
po and pi) are operating concurrently on the same sub-images, which are 
generated by 3 renderers (GPUs, Bo. Bi, and B,). The computation task 
distributed between the CPUs, each performing composition of one half of the 
same image. It is well-known that for any given number of GPUs one can 
speed up the compositing by increasing the number of paraUel compositors. 



However, increased number of renderers slows down the performance 
severely. The complexity of this method is 0(N*E/P) where N is the number 
of pixels in a raster (image), R is the number of GPUs, and Pis the number of 
compositing units (CPUs, Pi). The compositing process in this technique is 
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completed within R-1 iterations. In the implementation of this technique on 
SGI's Origin 2000 Supercomputer the compositing was carried out utilizing 
CPUs. The results of the compositing performed by this system are shown in 
Fig. 4. Fig. 4 demonstrates the overhead of this method, the compositing time 
required for this system is over 6 times the time required for the rendering. 

All the methods described above have not yet provided satisfectory solutions 
to the problems of the prior art methods for compositing large quantities of 
sub-images data into one image. 

It is an object of the present invention to provide a method and system for 
rendering in paraHel a plurahty of sub-image frames within a close to real 
time viewing. 

It is another object of the present invention to provide a method and system 
for concurrently composing large amounts of sub-image data into a single 
image. 

It is a further object of the present invention to provide a method and system 
which substantially reduce the amount of time requires for composing sub- 
image data into a single image. 

It is a still another object of the present invention to provide a method and 
apparatus for concurrently composing large amounts of sub-image data into a 
single image that can be implemented efdciently as a semiconductor based 
device. 



It is a still further object of the present invention to provide a method and 
apparatus for composing sub-image data based on presenting a competition 
between the multiple sources of the sub-image data. 
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Other objects and advantages of the invention wiU become apparent as the 
description proceeds. 

Siimtnai-Y of the Invent.l/^ri 

In one aspect the present invention is directed to a method and system for 
detecting the greatest ntimber from a plurality of Numbers Zx, Z2,..., Zr. 
Each of the Numbers is divided into two or more binary Segments 
Zf-'>,zj''-2),...,zf , where the bit length of the Segments is determined 
according to their level of significance and where sets of the Segments are 
arranged according to their level of significance wherein the first set of 
Segments zf-'Uf -•>,...,Z<''-> includes the Most Significant Segments of the 
Numbers and the last set of Segments 2!f\zf ,...,zf includes the Least 
Significant Segments of the Numbers. In the first step, the numerical values 
of the Segments 2,(^>,Z<'>,...,zP having the same level of Significance are 
simultaneously compared, for determining a group designating the Numbers 
which the numerical value of their Most Significant Segment is the greatest, 
and evaluating for the Least Significant Segments a Grade indicating their 
numerical size in comparison with the numerical value of the other Segments 
of the same level of significance. In a second step, starting fix)m the second 
set of Segments z!i'^-\zf--\...,Z^r\ the Grades of the Segments of the 
Numbers which corresponds to the group are compared, and Number 
indications are removed firom the group if their Grade is less than the highest 
Grade which corresponds to another Number indication in the group. The 
second step is repeated until the last set of Segments Z^**^Zf\...,Z^''^ is 
reached or until a single Number is designated by the group. 



Optionally, the Numbers are the depth values of pixels of multiple three- 
dimensional raster images. 
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The detection of the greatest number may further comprise comparing the 
Numbers with a threshold value and carrying out the detection of the 
greatest number only with the Numbers which their value is above or below 
the threshold value. 



A similar detection may be carried out for determining the smallest number, 
by designating by the group the Numbers which the numerical value of their 
Most Significant Segment is the smallest and by removing from the group 
Numbers designations whenever their Grade is greater than the smaUest 
Grade which corresponds to another Number indication in said group. 

In one preferred embodiment of the invention, aU the segments are of the 
same bit length. Alternatively, the bit length of one or more of the Least 
Significant Segments is greater than the bit length of the Most Significant 
Segment. 



In another aspect, the present invention is directed to a method and system 
for compositing a plurality of three-dimensional Sub-Images by examining 
the Depth values Zi, Z2,..., Zr, of the Pixels corresponding to same spatial, 
location in each Sub-Image and compositing the content of the Pixel having 
the greatest Depth value. The Depth values are divided into two or more 
binary Segments Zf -•\Z<'^-^),...,zfJ, where the bit length of the Segments is 
determined according to their level of significance and where sets of the 
Segments are arranged according to their level of significance wherein the 
first set of Segments Z^^^^^r^-^'Z^r^ includes the Most Significant 
Segments of the Depth values and the last set of Segments zf ,zf ,...,Z^'''^ 
includes the Least Significant Segments of the Depth values. In a first step, 
the numerical values of the Segments 2/''^4''V.,,2<'^ having the same level 
of Significance are simultaneously compared, and accordingly a group 
designating the Depth values which the numerical value of their Most 
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Significant Segment is the greatest is determined, and a Grade is evaluated 
for the Least Significant Segments indicating their numerical size in 
comparison with the nimierical value of the other Segments of the same level 
of significance. In a second step, starting fi:om the second set of Segments 

' ' ' ' * . the Grades of the Segments of the Depth values which 
correspoiids to the group are compared, and Depth value indications are 
removed firom the group if their Grade is less than the highest Grade which 
corresponds to another Depth values in the group. The second step is 
repeated until the last set of Segments zi°\zi°\...,z}^ -g reached or until a 
single Depth values is designated by the group. 

The detection of the greatest nxunber may further comprise comparing 
the Depth values with a threshold value and carrying out the detection 
of the greatest niimber only with the Depth values which their value is 
above or below the threshold value. 

A similar detection may be carried out for determining the smallest 
number, by designating by the group the Depth values which the 
numerical value of their Most Significant Segment is the smallest and 
the Depth values designations are removed firom the group whenever 
their Grade is greater than the smallest Grade which corresponds to 
another Number indication in said group. 

In another preferred embodiment of the iavention, all the segments 
are of the same bit length. Alternatively, the bit length of one or more 
of the Least Significant Segments is greater than the bit length of the 
Most Significant Segment. 



The invention may be implemented on a single integrated drciiit chip, for 
instance, it may be a VLSI implementation. 
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Brief DescriptioTi n f the Drawings 
In the drawings: 

- Fig. 1 is a block diagram illustrating the merging of a plurality of sub- 
images data into a single image; 

- Fig. 2A is a block diagram illustrating the prior art Hierarchical 
compositing method; 

- Fig. 2B graphically illustrates parallelism limitations of Hierarchical 
compositing performance; 

- Figs. 3A and 3B include a block diagram and a pseudo-code showing 
the Shared Memory Compositing methods of the prior art; 

- Fig. 4 graphically illustrates the performance of Shared Memory 
Composition methods of the prior art; 

■ Fig. 5 is a block diagram exemplifying a preferred embodiment of the 
invention; 

- Fig. 6. is a block diagram illustrating a promotion system according to 
the invention; 

- Fig. 7A is a block diagram demonstrating the principle of the Wired- 
AND function of tlie invention; 

- Fig. 7B is a block diagram illustrating the principle of wired-AND 
competition of N binary numbers; 

- Fig. 8 is a block diagram illustrating a preferred embodiment of an 
Associative Unit; 

- Fig. 9 is a block diagram schematically illustrating the logic of a 
Primary Segment; 

- Fig. 10 is a block diagram schematically illvistrating the logic of a Non- 
Primary Segment. 

- Fig. 11 is a block diagram schematically illustrating a reduced 
Promotion Matrix; 

- Fig. 12 exemphfies a competition process of 5 depth values; 

- Fig. 13 is a block diagram illustrating a chip implementation of a 
preferred embodiment of the invention. 
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Detailed Descripti on of Preferred Embodiments 

The presented invention is directed to a method and system for re- 
composition of miiltiple three-dimensional/depth raster images into a two 
dimensional image in an associative fashion. According to a preferred 
embodiment of the invention the rendered graphics data (sub-images), 
provided via multiple graphic pipelines, is resolved at each raster coordinate 
for the closest pixel to the viewer. This task is accomplished by performing an 
autonomous associative decision process at each pixel, simultaneously for all 
pixels at a given raster coordinate by utilizing multiple Associative Units 
(AU). The final image obtained by the composition outcome is outputted for 
viewing. The present invention overcomes the inadequate overhead of the 
prior art methods, which are generally based on hierarchical combination of 
images for viewing. 



In principle, the present invention presents a competition for the highest 
depth (Z) value among multiple sources. The highest depth value should be 
^determined by utihzing multiple AUs. Each AU continuously examines the 
local depth value against the other values that are presented, and 
autonomously decides whether to quit the competition against other AUs or 
to further compete. In contrast to the conventional sorting methods, which 
are of sequential nature, according to the present invention a decentralized 
process can be performed in parallel, which substantiaUy speeds up the 
composition performance. Additional advantages of the present invention are: 
(i) it can be performed on numbers of any length; and (ii) it suits any number 
of sources, without diminishing the performance. 

Fig. 5 provides a general illustration of the composing mechanism of the 
invention. The Composition System 51 of the invention is fed with the sub- 
image data provided by the graphics pipelines (FB,, j=l, 2, 3,..., E). At the 
Composition System 51 the sub-image's datum (Zj, Pj) is provided to a set of R 
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corresponding AUs (AU,, j=l, 2, 3,..., R), each of which is capable of handling 
image pixels at the same XAT coordinate of the screen. Each competing datxam 
is composed of contents Pj (e.g. color, transparency, also referred to as RGB- 
value herein) and depth of a pixel Zj. 

The Z-values are received by the AUs and introduced on the Depth 
Competition Bus (DCB). The logical state of the DCB Hnes is sensed by the 
AUs which accordingly produces Carry-in and Stop-Mark vectors which are 
used together with the Promotion Matrices (PM) 53 to determine whether 
they hold the highest .^-value. The decision concerning a competing datum is 
carried out locally at each AU, based on an associative mechanism, and on 
comparison with other AUs on the DCB. Finally, the AU holding the closest 
pixel (i.e., highest Z-value) is allowed to pass the pixel's color Pj (RGB-value) 
to the final raster 50, whidi constructs the final image 55. 

The Depth Composition Bus (DCB) architecture intelligently deploys a wired- 
AND logic, as shown and demonstrated in Figs. 7A and 7B. The ftmctionaUty 
of the wired-AND logic 70 is similar to the functionality of the regular logical 
AND function. However, the Wired-AND function introduces numerous 
outputs on a single electric point, wherein the regular logical AMD gate must 
output its signal to another gate, which is isolated firom any other output. As 
demonstrated in Fig. 7A, a logical "0" state on any one of the inputs forces a 
logical "0" state on the ouiput hne. 

The comparison process on the DCB is carried out in a bit-significance 
successive manner. As shown in Fig. 7B, each Z-value (Zi, Zz, Zs,..., Zr) 
provided via the graphic pipeline (FB>) is fed into the respective AU (AU/). 
The hnes of the DCB are used for outputting the wired-AND results of each 
segment of the iV-binary Z-values (Zf , £=0, 1, 2,..., N), according to their 
level of significance. In this way the DCB©) hnes are used as outputs of the 
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wired-AND logic carried out on the Least Signij&cant Segment (LSS, also 
referred to as non-primary segment) of the ^'-values (Zf^), and the DCB(^-i) 
lines are used for outputting the wired-AND carried out on the Most 
Significant Segment (MSS, also referred to as the primary segment) of the Z- 
values (Zf-'>). 

The comparison process is carried out in an ordered fashion, starting firom the 
most significant bits of the Z-values, and it is finalized at the least significant 
bits of the ^^-values. The competition starts when the AUs output the Most 
Significant Bit (MSB) on the up most line of DCB(^-i). The duration of this 
process always takes a constant time of logal^l, where |Z| is the depth 
domain of the scene, i.e. the bit length of the Z-values. Consequently, the 
multiple-stage structure of the prior art methods, is replaced by a single 
stage according to the method of the present invention. The performance 
complexity of OaoggZ * logaiV) of the prior art methods is significantly 
reduced by the method of the present invention to 0(logj^Z). 

In the comparison of the of the MSS bits zf''^^ placing a single logical "0" 
state, or any number of them, on the DCB lines DCBOV-D, forces a "0" logical 
state on said lines. AUs which placed a "1" logical state on a DCB Hne, and 
sensed a resultiag "0" logical state on said line, terminates their competition 
for their current Z-value, otherwise the AUs are permitted to continue their 
competition to the next successive bit (less in significance), as exemplified ia 
Table 1. 

Table 1: comparison of the MSS bits zj**^^. 



Forced value 


state of DCB line 


Decision 




"0" 


"Stop" 






"Continue" 


"0" 


"0" 


"Continue" 


«0" 




"Continue" 
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It Should be noted that the last case shown in Table 1 above is actually not 
feasible, since the forcing of a logical «0" state on a DCB line must force this 
hne to a logical "0" state. 

The decision as to whether the Z-value ought to continue competing is 
estabHshed by each AU by sensing the logical state of the DCB lines. The last 
surviving AU "wins" the competition. When the comparison logic of the AUs 
identifies a higher Z-value on the bus, it detaches itself from the competition. 
Otherwise it keeps competing until the value remaining on the bus is the one 
with the highest ^-value. 

Fig. 8 is a block diagram illustrating the AU operation. The KP^ lvalue 
{Z^t'\2^K~'''* ,";Zf) is wired to the DCB via a set of gates (90. shown in Fig. 
9), and the pixel value Pk is gated through to the merged FB 50 via gate 84. 
The Associative Logic 80 enables the Wired-AND functioning, controls the 
unit's competition, and aUows the RGB-value Pk to pass through to FB 50 
utihzing an enabling indication 81, upon accepting winning acknowledge 86 
{W^ from one of the PM 53. 

The AU generates a Stop Mark (SM) vector 85 SMt=[SMf\SM'^^\...,SM^^-^^) 
which is generated by the Associative Logic 80 for the LSSs of the 2^yalue 
{Zf, 1=0. 1, 2...., N-2) and provided thereafter to the PM 53. A Carry-Out 
indication (Cj^-'>) is also produced by the AUk which indicates whether the 
MSS (Z]."-')) of the Z-value Zk won the first stage of the competition. 

It should be noted that the associative logic 80 may be designed to extend the 
competition fimctionality of the AU in various ways. For instance, the 
of the incoming Z-value may be used (in aU AUs) for the competition, and 



mverse 
in 
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such case the competition between the AUs wiU be performed on the basis of 
determining the smaUest depth value. Alternatively, one may prefer to place 
on Z-value inputs of the AUs a threshold value, and in this way to enable the 
competition of only those Z-values which are greater-than, or smaUer-than, 
the threshold value. 

At the local AU each Z-value Zj is segmented into N segments 
. Zf->,Z<''-^>,...,Zf>. where the (iV-l)th segment Zf"') holds the MSBs, and the 
(0) segement Zf holds the LSBs of Zy. For example, let assume a 32 bits long 
Z-value number, processed in 3 segments (i.e., iV=3, Zf\Zj\ and Zf^) 
where the first and second segments (Zf and Zf) are each 8 bits long and 
the third segment (Zf ) is 16 bits long. In the first stage of the competition 
one or more prehminary winners are determined according to the MSS (the 

first segment) of the Z-values (Zf , j=l, 2 R), and Stop-Marks grading 

iSMf, i^O. 1, 2,.... N-2, e.g., ^Aff and SMf^ for N=3) is established 
according to the competition betsyeen aU the other segments (LSSs) of the Z- 
values (Zj'>, i=0, 1, 2,..., N-2, e.g., Zf and Z« for i\^3). In the next step of 
the competition the Stop-Marks grading 5A/f-^> (e.g., SM^'>), which were 
established for the second segment {Z'/-^\ e.g.. Zf), that corresponds to the 
Z-values which won the first stage, are examined to determine which of those 
Z-values continues to compete. The same process is carried out with the Stop- 
Marks grading established for the next segments (SM^'\ i=0, 1, 2,..., N-3, 
e.g., SMf> for N=3), until the highest Z-value is determined according to the 
results of the last segment (the LSS, Zf ). 

Each AU processes aU the segments (Z«, i=0, l,...,JV.l) in paraUel. The AUs 
controls the entire Z-value numbers, according to the segmentation used firom 
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MSB to LSB. Wlule the AU logic lets the first segment Zf to compete and 
submit its Carry-Out c^^-O to PM(a^-i) (Fig. 6). it lets the second segment 
Z)''-^> to compete and submit its Stop-Marks SM^^'^^ to PM^v-D. Similarly, the 
third segment Zf of the Z-value competes and submit Stop-Marks grading 
SM^"-'^ to PM(^-2), etc. 

The segment length is chosen to optimize the competition performance. For 
. example, for 32 bits long ^-values, processed in 3 segments, , Zf and 
Zf, of 8, 8, and 16 bits respectively, the SM vectors for the second segments 
571^(0 can be prepared whfle the first segment Zf is processed, and the SM 
vectors for the third (the longest) segment 5^0) can be concurrently 

prepared taking the advantage of the time period required for the two 
previous segments, Zf and Zj^ . 

This process is iUustrated in Fig. 6. Numerals 61-64 schematicaUy designates 
the indications generated by the AUs according to the segmentation of the Z- 
values. There are N-\ PMs 53 serving the LSSs of the Z-values (Zf , i=0, 1. 
2,..., iV.2). The PMs 53 generates the Carry-Out vectors. 

^\C^''~^^...,C^')) which are determined according to the respective Stop- 
Mark vectors SA/W = (W« W« ....wJJ) generated by the AUs for the 
corresponding (jth) segment, and the Carry-Out vectors C<'-'> which was 
produced by the PM('-i) in the previous stage. Additional signals generated by 
the PMs is (a) the winning indication Wj, which designated the wining Z- 
values in each stage of the competition, and (b) and "stop competition" signal, 
that is generated once a single winner is determined which prevents 
subsequent matrices fi:om carrying on their competition . 
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The AlTs logic for generating the Carry indications for the first segment 
2")""'' = (^y^^"'\Z)^"'\...,Zj^:}^) of the Z- values is shown in Fig. 9. To e3q)loit the 
Wired-AND functionahty the inverse state of each bit is introduced to the 
DCB lines via the logical NAND gates 90. For each examined bit Z^-V^ a 
logical OR gate 92 is used for determining if the 2'-value continues to compete 
in the next bit level Zj;][;p according to the logical state of the respective DCB 
line and the logical state of the examined bit ZjJf-'>. Each bit stage Zj^-'^ 
controls the next bit stage Zj^"'> via the logical AND gates 92. The Carry-Out 
indication Cj/'"'' is generated only if all ti bit stages survived the competition. 
The Carry-Out iadication is provided to the PM(^-i) Promotion Matrix shown 
in Fig. 6, and in this way enables further competition of the Z-value Zj in the 
next segment. 

Simultaneously, while the AUs examine the first segments of the Z-values, 
each of the LSSs (Zf, i=0, 1, 2,..., N-2) is also examined by wired-AND logic. 

However, in the examination of the LSSs Stop-Mark SM^^ (i=0, 1,2 N-2) 

signals are generated, instead of the Carry-Out Cf-'^ indications which were 
generated for the first segment. Each Stop-Mark SM^ signal is forwarded to 
the respective PM® Promotion Matrix as part of Stop-Mark vector SM^'K 

A Stop-Mark SM^'^ indicates the "weak" bit of the respective segment Zj'' 
that potentially drops-out the entire Z-value Zj from the competition. The 
logic for generating the SM signals for the LSSs (SM^I\ i=0, 1, 2,..., N-2) is 
shown in Fig. 10. In principal, this logic is similar to the logic used for the 
generation of the Carry-Out vector (An-\) . However, it differs in that each bit 
stage can generate a Stop-Mark signal, "stop 1" - "stop (n+l)", via inverters 
99. For each LSSs segment (Zf, i=0, 1. 2.,.., N-2) only one Stop-Mark signal 
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5Afj') is generated. The highest possible Stop-Mark signal "stop (n+1)" 

indicates that the examined segment did not fail in any of its wiring 
comparisons. 

The logic of the Associative Matrices handles the Stop-Mark vector SMf and 
the previously generated Carry-Out vectors q{m), and generating a new 
Carry-Out vector ^W. In this new Carry-Out vector (<t) only those AUs 
which survived the competition so fiar are participating. If just a single AU 
survived, it becomes the final winner, discontinuing the competition process. 
Otherwise thie next PM (PM^'-D) performs the same task, xmtil a single winner 
is left. 

Fig. 11 is a block diagram illustrating the logic of a PM (reduced case). For 
the sake of simplicity Fig. 11 illustrates an PM (i.e., PM®) serving two 
AUs (B=2), wherein the i«» segment is 4 bits long (i.e., 5 Stop-Marks, 

SMf = {SM%SM%„.,SM%)). Each of the Stop-Mark vectors and 
sets on one of the FFs 110 and 111. No more than one FF 110 and 111 can be 
in "ON" state in a row. The Stop-Mark vector for which a wiiming indication 
Cj'*'' is received from the previous PM (PM(»+i)) wiU generate a Carry-Out if, 
and only if there is no other Stop-Mark (in another row) with a higher 
number and having a Carry-Out indicating winning in the previous PM 
(PM<i+i)). 

The operation of previous columns in the PM is disabled via the Disable 
Function 113 upon receipt of a corresponding indication from the logical AND 
gates 117 gathered via the logical OR gate 119 of the column in which a Stop- 
Mark indication having the highest level is received, and for which' a 
corresponding Carry-Out indication is received from the previous PM. If the 
Stop-Marks received by the PM are of the same significance (e.g., SM^] and 
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•SMJJ), Cariy-Out indications C^"^ and C^^ are provided to the next PM (PM«) 
via buffers 112. • 

Only one winning W^'^ signal can be produced by one of the PMs. Whenever 
Detector 115 indicates that a single winner was determined m the CTirrent 
stage, the Disable Function 113 produces a Stop^'^ indication which will 
disable further processing by the PM of the next stage PMe-«. Whenever a 
Stop signal is received by the Disable Function (e.g., Stop^'*^^) it disables the 
functioning of the current and the following PMs by disabling the gates 117 
and by issuing a Stop indication (e.g., Stop^'^) to the Disable Function of the 
following PM. 

For example, assuming that AUi sets on Stop-Mark 4, SM^^, AU2 sets on 
stop-mark 2, SM^'}^, and that both Carry indications, c/'*'^ and c^'+», received 
from the previous PM indicates winning in the previous stage of the 
competition. In such case the Z-value competing in AUi wins, disables 
colxunns 1-3 via the Disable Function 113, and generates the only Cany-Out 
C^f). Detection of a single Carry-Out, indicating a single winner at the 
current stage, resiilts is generating a win acknowledge signal yprQ) via the 
Single Cany-Out detector 115 which is provided on the Wi line to AUi. The 
winning AU is then enabled to provide its RGB value Pk to FB 50. 

If for instance AUi and AU2 both turn on Stop-Mark 4, SM^Jjt , and Carry 
indications c/'*'> and indicates that both Z-values won in the previous 
stage, then the two Carry-Outs cjo and c<'^ transferred to the next PM 
(PMtf+W) will indicate also wining in the current stage . 
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Fig. 12 exemplifies the competition process of iJ=5 Z-values (Zi, Zi, Zz, Z4, 
and Zs) belonging to AUs a through e. In this example the depth measure of 
the Z-values is of 32 bits, and the depth values are segmented into iV=4 
segments (Zf ,Zf ,Zf , and Zf ) of 8 bits each. The total processing time in 
this example is 11 time units, while sequential wired-AND process (without 
Promotion Matrices) would take 32 time units. 

In the first segment in this example, the MSSs, Zf ^ , Zf^ , and Zf > are aU 

equal and greater than Zf^ and Zf , and therefore only the corresponding 

Cf>, Cf^, and cf^ Carry-Out signals are produced to indicated the ^-values 

Z%, Zz, and Zs, won the first stage. At the same time, the SM vectors of the 
LSSs are produced by the AUs. 

As for the second segment of the Z-values, the 6 MSBs of the Zf numbers 
are aU equal. A "Stop T SM is indicated for Zf\ and it does not further 
compete since the state of its 7«» bit is "0" (Z^V = 0) while the state of the 1^ 
bit of aU other Z-values in the segment is "1" {.Zf]=iZf] ^Zf^^^^Zf^^V): A 
"Stop 8" SM is indicated for Zf" and it is also terminating further 
competition since the state of its 8*^ bit is "(T i.Zfl = 0), while the state of the 
8«> bit of the values that their competition proceeds in this bit stage is "1" 
(Z^fs^=Z{f> = Z<J = l). Consequentiy. "Stop 9" SM is indications are produced 
for Z^^' , Zf , and Zf^ , since they won in each and every bit stage in the 
segment. Accordingly, the processing of the and vectors in PM(2) 

will produce Carry-Out indications C5^> and C^^] to the next PM, PMW. 

As for the third segment of the Z-values, a "Stop 2" SM is indicated for Zf^ 
which stops any further competing since Z^'^^O and Z^\^ = 0 while 
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=-^2l2 =^2 =1, and "Stop 9" SM is indicated for Z<'> Z"' Z« and 
since the values of their 6 MSBs are in equality. Accordingly, the 
processing of the SM^'^ and vectors in PMd) will produce a single Carry- 
Out indication C<^) to the last PM, PMW. Since PMd) determined a single 
winner, its detector 115 generates a corresponding indication W2 to the 
winning AU, AU2, which enables its RGB value P2 into FB 50. Consequently, 
the Disable Function of PMd) generates a ^/op^^ indication which disables 
further processing in the last PM, PMW, 

The processing of the third segment is not carried out. Nevertheless, SM^'^^ 
indications are produced by the AUs. A "Stop 1" SM is indicated for Zf\ since 
Zf] = 0 while Z^?> = Zi5> = Zff = Zl? = 1 . A "Stop T SM is indicated for Zf > and 
zr, since 2<^=4°>=Zi>Zi.°>=l for r=2, 3. 4. 5, and 6, Zi°^=Zi.o> = 0 and 
^1.7 =1- Consequently, "Stop 8" SM is indicated for Z<°^ and "Stop 9" SM 
is indicated for Z<°^ , since Zf^ = 0 and Z{°^ = 1 . 

The competition time can be further reduced by merging the SM results of 
segments, while aU segments are kept uniform in length. Such reduction 
allows clustering of results prior to the arrival of the Carry-Out indications 
from the previous PM. This approach further reduces the complexity from 
O(log2^, to 0(aog2^/k), wMle A: is a folding factor. For example, assuming 
Z^232, a sequential wired-AND process would take complexity of 0(32). 
However, using 4 PMs of 8 bits each, the second half of the number is being 
"folded" at the time of processing the first half. As a result, the complexiiy is 
reduced to 0(8+l+l). In this case the folding factor k is 32/10 = 3.2. In case of 
longer numbers of e.g. 64 bits, the order of complexity is not significantly 
changed: 0(8+1+1+1). The advantage of this paraUel approach is in that any 
bit length of Z-value numbers can be processed at almost the same short 
time, while keeping high efficiency. 
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Fig. 13 is a block diagram illustrating a cMp implementation 140 of a 
preferred embodiment of the invention (e.g., VLSI). This example illustrates 
an implementation for compositing 6 FBs from 6 different GPUs. This 
implementation realizes a compositing miit for simtiltaneously composing a 
plurality sub-images pixels by a plurality of Sub-Image Units (SIU). Each 
SIU comprises a set of AUs corresponding to the number of GPUs, a DCB, a 
PM, and Control Logic 141. The Control Logic 141 at each SIU sorts-out, from 
the data input stream retrieved via the Port-Port6 input ports, only those 
pixels that match with the coordinates of the respective sub-image. Each SIU 
outputs the RGB data of one of the sub-images, which is outputted to FB 50 
via output port 142. 

The entire compositing process is further parallelized by dividing each FB 
into 16 sub-images. For example, for an image having resolution of 
1024x1024 pixels, each Sub-Image Unit (SIU) processes a 64x64 sub-image 
(1/16 of the image). If for example the pixels' color data is 24 bits long, the 
output of the stack of SIUs includes 12KBytes of winning pixels color data. 

Opposed to prior art, the present invention allows carrying out a single 
merging step for any number of GPUs B, as described in Fig. 5. The 
hierarchical structure of the prior art method has been replaced in the 
present invention by a imique, flat, and single step structiure. The 
performance of this new structxxce is insensitive to the level of paralleHsm, 
i.e., the number of participating GPUs. Composition time is practically 
reduced to a single comparison and any arbitrary number of GPUs is allowed, 
with no sacrifrce to the overall performance. 

The above examples and description have of course been provided only for the 
purpose of illustration, and are not intended to limit the invention in any 
way. As will be appreciated by the skilled person, the invention can be 
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carried out in a great variety of ways, employing teckoiques different from 
those described above, all without exceeding the scope of the invention. 
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CLAIMS 

1. A method for detecting the greatest number from a pltirahty of 
Numbers Zx, Zz,..., Zr, comprising: 

a) dividing each of said Numbers into two or more binary Segments 
Z^'*~^\z^'^~^\...,zf\ where the bit length of said Segments is determined 
according to their level of significance and where sets of said Segments are 
arranged according to their level of significance wherein the first set of 

Segments zj^-'^zj''"') Z^^-'^ includes the Most Significant Segments of 

said Numbers and the last set of Segments zf\zf\...,zf^ includes the 
Least Significant Segments of said Numbers; 

b) simultaneously comparing the nimierical values of the Segments 
zp,zi'\...,zi''^ having the same level of Significance, determining a group 
designating the Numbers which the numerical value of their Most 
Significant Segment is the greatest, and evaluating for the Least 
Significant Segments a Grade indicating their numerical size in 
comparison with the numerical value of the other Segments of the same 
level of significance; 

c) starting from the second set of Segments Z/^"^^z|^''^V",^i^"^\ 
comparing the Grades of the Segments of the Numbers which corresponds 
to said group, and removing from said group any Nimiber indication with a 
Grade that is less than the highest Grade which corresponds to another 
Nimiber indication in said group; 

d) repeating step c) until the last set of Segments Zj^^Kzi^K.^.^zP is 
reached or until a single Number is designated by said group. 

2, A method according to claim 1, wherein the Nimibers are the depth 
values of pixels of multiple three-dimensional raster images. 



3, A method according to claim 1, further comprising comparing the 
Numbers with a threshold value and carrsdng out the detection of the 
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greatest number only with the Numbers which their value is above or below 
said threshold value. 

4. A method according to claim 1, for determining the smfiOlest nimiber, 
wherein the group is determined to designate the Numbers which the 
numerical value of their Most Significant Segment is the smaUest and the 
Numbers designations are removed firom said group whenever their Grade is 
greater than the smallest Grade which corresponds to another Number 
indication in said group. 

5. A method according to claim 1, wherein the aU the segments are of the 
same bit length. 

6. A method according to claim 1, wherein the bit length of one or more of 
the Least Significant Segments is greater than the bit length of the Most 
Significant Segment. 

7. A method for compositing a plxirality of three-dimensional Sub-Images 
by examining the Depth values Zi, Z2,..., Zn, of the Pixels corresponding to 
same spatial location in each Sub-Image and compositing the content of the 
Pixel having the greatest Depth value, comprising: 

a) dividing each of said Depth values into two or more binary 
Segments zf-'Uf-'V..,zf , where the bit length of said Segments is 
determined according to their level of significance and where sets of said 
Segments are arranged according to their level of significance wherein the 
first set of Segments zf-'Uf-'l..,zf-'^ includes the Most Significant 
Segments of said Depth values and the last set of Segments zf> ,zf^ ,...,zf> 
includes the Least Significant Segments of said Depth values; 

b) simultaneously comparing the numerical values of the Segments 
1 » 2 ji havmg the same level of Significance, determining a group 
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designating the Depth values which the numerical value of their Most 
Significant Segment is the greatest, and evaluating for the Least 
Significant Segments a Grade indicating their numerical size in 
comparison with the numerical value of the other Segments of the same 
level of significance; 

c) starting fi-om the second set of Segments ^i^'^K^i^'^^K^^-yZ^^''^^ ^ 
comparing the Grades of the Segments of the Depth values which 
corresponds to said group, and removing from said group any Depth value 
indication with a Grade that is less than the highest Grade which 
corresponds to another Depth values in said group; 

d) repeating step c) xmtil the last set of Segments is 
reached or until a single Depth values is designated said group. 

8. A method according to claim 7, further comprising comparing the 
Depth values with a threshold value and carrying out the detection of the 
greatest niimber only with the Depth values which their value is above or 
below said threshold value, 

9. A method according to claim 7, for determining the smallest number, 
wherein the group is determined to designate the Depth values which the 
numerical value of their Most Significant Segment is the smallest and the 
Depth values designations are removed from said group whenever their 
Grade is greater than the smallest Grade which corresponds to another 
Number indication in said group. 

10. A method according to claim 7, wherein the all the segments are of the 
same bit length. 

11. A method according to claim 7, wherein the bit length of one or more of 
the Least Significant Segments is greater than the bit length of the Most 
Significant Segment. 
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12. A system for compositing a plurality of tliree-dimensional Sub-Images, 
comprising: 

a) Bus lines for concurrently introduciag the bits of a plurality of 
Depth values of pixels, where on each Bus line the bits having the same 
level of significance are introduced, the logical state of said lines is set to 
"1" whenever the logical state of aU of said hits is "1", and it is set to "0!' if 
the logical state of at least one of said bits is "0"; 

b) Associative Units for concurrently reading the data of the pixels 
corresponding to the same spatial location in said Sub-Images, dividing the 
Depth value of each read pixel into two or more segments, introducing said 
segments on the respective lines of said Bus, sensing the logical state of 
said lines, and accordingly concurrently producing intermediate 
comparison results for the Most Significant Segments of said values which 
designates the Depth values having the grieatest ntimerical value, and for 
the Least Significant Segments Stop-Marks Grading indicating their 
numerical size in comparison with the nvunerical value of the other 
Segments of the same level of significance; 

c) Promotion Matrices for serially producing intermediate 
comparison results for each subsequent set of segments in order of 
significance, starting firom the set of segments following the set of Most 
Significant Segments, by removing firom the previously produced 
intermediate comparison results Depth value designations for which the 
corresponding Stop-Mark Grading is less than the greatest Stop-Mark 
Grading that is related to one of said intermediate comparison results, 

where said Promotion Matrices are capable of indicating that the 
currently produced intermediate comparison results includes a single 
designation such that the pixel data can be retrieved for the compositing 
firom the respective Associative Unit. 
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13. A system according to claim 12, further comprising disabling means for 
disabling the operation of subsequent Promotion Matrices whenever it is 
indicated by a Promotion Matrix that the currently produced intermediate 
comparison results includes a single designation. 

14. The system of claim 12, wherein the system is implemented on a single 
integrated circuit chip. 

15. The system of claim 12, wherein the chip is a VLSI implementation. 

16. An Associative Unit for introducing the bits of segments of a Depth 
value of a pixel on the Hues of Wired-AND Bus, issuing Caxry-Out and Stop- 
Mark indications, and enabling the data of said pixel according to a 
corresponding external enabling indication, comprising: 

a) Primary Segment Logic circuitry for enabling the introducing of 
the bits of the Most Signijicant Segment of said Depth value on the 
respective lines of said Bus, sensing the logical state of said lines starting 
from the Most Significant line, and if the logical states of the sensed line 
and of the corresponding bit is "0" disabhng the sensing of the consecutive 
Bus lines, otherwise enabling the sensing to proceed until the end of said 
Segment and issuing a Carry-Out indication; 

b) One or more Non-Primary Segment logic circuitries for enabling 
the introducing of the bits of the Least Significant Segments of said Depth 
value on the respective lines of said Bus, sensing the logical state of said 
lines starting from the Most Significant Kne, and if the logical states of the 
sensed line and of the corresponding bit is "0" disabling the sensing of the 
consecutive Bus lines and issuing a Stop-Mark indication which 
corresponds to the level of significance of said bit in its Segment, otherwise 
enabling the sensing to proceed \mtil the end of said Segment and issuing 
a Stop-Mark indication having level of significance being one level higher 
than the Most Significant bit in said Segment; and 
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c) a gate for enabling the output of said data whenever said 
enabling indication is received, 

where the logical state of each line of said Bus is set to "1" whenever the 
logical state of all of the bits introduced on it is "1", and it is set to if the 
logical state of at least one of said bits is "(X', and where said enabling 
indication is determined externally according to said Carry-Out and Stop- 
Mark indications, 

17. An Associative Unit according to claim 16, further comprising means 
for enabling the operation of the Primary and Non-Primary Segment logic 
circuitries whenever the value of the Depth value is greater than a threshold 
value. 

18. An Associative Unit according to claim 17, wherein the means are 
enabling the operation of the Primary and Non-Primary Segment logic 
circuitries whenever the value of the Depth value is smaller than a threshold 
value. 



19. An Associative Unit according to claim 16, in which the sensing of the 
consecutive Bus Unes is disabled whenever the logical states of the sensed 
line and of the corresponding bit is 
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